منابع مشابه
Improving ROUGE for Timeline Summarization
Current evaluation metrics for timeline summarization either ignore the temporal aspect of the task or require strict date matching. We introduce variants of ROUGE that allow alignment of daily summaries via temporal distance or semantic similarity. We argue for the suitability of these variants in a theoretical analysis and demonstrate it in a battery of task-specific tests.
متن کاملPlagiarism Detection using ROUGE and WordNet
With the arrival of digital era and Internet, the lack of information control provides an incentive for people to freely use any content available to them. Plagiarism occurs when users fail to credit the original owner for the content referred to, and such behavior leads to violation of intellectual property. Two main approaches to plagiarism detection are fingerprinting and term occurrence; ho...
متن کاملROUGE: A Package For Automatic Evaluation Of Summaries
ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation. It includes measures to automatically determine the quality of a summary by comparing it to other (ideal) summaries created by humans. The measures count the number of overlapping units such as n-gram, word sequences, and word pairs between the computer-generated summary to be evaluated and the ideal summaries created by humans...
متن کاملBetter Summarization Evaluation with Word Embeddings for ROUGE
ROUGE is a widely adopted, automatic evaluation measure for text summarization. While it has been shown to correlate well with human judgements, it is biased towards surface lexical similarities. This makes it unsuitable for the evaluation of abstractive summarization, or summaries with substantial paraphrasing. We study the effectiveness of word embeddings to overcome this disadvantage of ROUG...
متن کاملThe limits of automatic summarisation according to ROUGE
This paper discusses some central caveats of summarisation, incurred in the use of the ROUGE metric for evaluation, with respect to optimal solutions. The task is NPhard, of which we give the first proof. Still, as we show empirically for three central benchmark datasets for the task, greedy algorithms empirically seem to perform optimally according to the metric. Additionally, overall quality ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Voix et Images
سال: 2001
ISSN: 0318-9201,1705-933X
DOI: 10.7202/201570ar